A Combined Theory Data-Driven Approach to Classifying Delinquency Risk in the Future of Families and Child Well-Being Study
Name: Nicholas Vietto
PhD Candidate at the University of Nebraska - Omaha
Research Interests: Biopsychosocial Criminology, Quantitative Methods, Data Visualization, Open-Science, Open-Source Software
Prior studies of risk for delinquency show that risk factors across multiple domains are associated with increased risk for delinquency.
Studies in this area commonly show individual and socio-environmental differences are associated with risk for delinquency
· Individual - cognitive (e.g., IQ) and trait measures
· Socio-environmental – parents, peers, and communities
A smaller body of research has shown that genetic variation is also associated with risk for delinquency
· Genetic variation associated with dopaminergic and serotonergic function
· Often in interaction with environmental risk factors (e.g., childhood adversity)
Computational methods used in prior work on risk for has methodological limitations including an over reliance on mass-univariate testing (Dwyer et al., 2018).
Analyses with Machine Learning can improve our understanding of risk for delinquency:
Advanced Data Processing: Efficiently handles and analyzes large amounts of data to enhance predictive power.
Uncovering Complex Relationships: Identifies non-linear and higher-order interactions, especially in high-dimensional datasets, providing deeper insights into variable relationships (e.g., high dimensional data like image, audio, etc.).
Enhanced Predictive Accuracy: Continuously refines predictions through iterative learning, improving overall accuracy over time.
Using data from the ABCD Study, Chan et al. (2023) applied a Feed-Forward Neural Network to classify conduct disorder (CD) in children, utilizing a multidomain approach.
Their findings revealed that a model incorporating social, psychological, and biological factors outperformed single-domain models in predicting CD, achieving 91.18% accuracy and an AUC of 0.957.
Generalized the approach of Chan et al. (2023) to risk for delinquency.
Using Future of Families and Child Wellbeing Study (FFCWS):
Expanded Sociological Domain: Incorporates rich socio-environmental predictors, including census tract variables, labor market and proximity to gun-violence incidents.
Incorporating Genetic Data: Specifically, incorporate genes involved in the serotonergic and dopaminergic pathways to examine the role of polymorphic variation.
Classifying Delinquency Risk rather than a CD diagnosis.
Framework:
Feed-Forward Neural Network using the {tidymodels} framework in R
Data Spending:
2128 observations
60/20/20 (1276/426/426) split for training, validation, and testing
Feature Engineering Steps:
Socio-Environmental Domain
Parental Monitoring Scale (Focal Child, Year 15)
Neighborhood Collective Efficacy Scale (Focal Child, Year 15)
Conflict Tactics Scale (Focal Child, Year 15)
Material Hardship Scale (PCG, Year 15)
Psychological Domain
BSI 18 Anxiety Scale (Focal Child, Year 15)
Center for Epidemiologic Studies Depression Scale (CES-D) (Focal Child, Year 15)
Dickman’s Impulsivity Scale (Focal Child, Year 15)
Genetic Domain
Genetic Data Constraints: Genetic information is confined to markers from the candidate gene era, potentially limiting genomic coverage.
Sample Size: The relatively small sample size may impact the robustness and generalizability typical for machine learning applications.
Age of Assessment: Age 15 may be early for assessing delinquency risk, as behaviors predictive of long-term patterns may not yet be fully evident.
Enhance Domain Optimization: Add features to maximize the model’s performance in each specific domain (e.g., adding labor markets for distal predictors in the sociological domain).
Evaluate Fairness Across Ethnicities: Assess the final model’s performance across different ethnic groups to ensure fairness, verifying it does not exhibit biases against social or minority groups.
Test model on Year 22 data: Validate the model’s performance on the Year 22 data to assess its generalizability and predictive power.
Treat delinquency as continuous rather than a using a categorical classification model.
Q u e s t i o n s ?
Data Modeling Culture
Primary Focus: Deriving causal inference
Approach: Emphasizes deductive reasoning
Process: Models the data-generating process to clarify relationships between X and Y
Culture: Grounded in methodologies developed primarily by statisticians
Algorithm Modeling Culture
Primary Goal: Maximizing predictive accuracy
Approach: Emphasizes inductive reasoning, with a focus on learning patterns directly from data
Process: Utilizes black-box models to capture relationships between X and Y
Culture: Rooted in methodologies developed primarily by computer scientists